Introduction and Data Summary
The dataset used in this exploratory analysis consists of 38113 observations of 81 features describing fuel economy data from 1984-2017 by the Environmental Protection Agency (EPA). Of the 81 features included in the dataset as provided, 8 of them contain no data at all, and several more of them contain high proportions of missing values. To avoid dealing with these incomplete features, this analysis will focus on the general attributes that are available for all records. The primary features being evaluated in this analysis are vehicle class, transmission type, year, fuel efficiency (mpg), and fuel cost.
The image above represents the distribution of Fuel Efficiency in Miles Per Gallon for all vehicles included in the dataset. The highest proportion of vehicles appear to have a MPG less than 25.
The distribution of Estimated Annual Fuel cost is presented above. According to the kaggle.com description of this dataset, Annual Fuel Cost is estimated based on a few assumptions. It is assumed that for 15,000 miles of driving per year, 55% and 45% of that driving is done on city roads and highway roads respectively. Additionally, it is assumed that fuel cost maintain the following price structure:
- $2.33/gallon for regular gasoline
- $2.58/gallon for mid-grade gasoline
- $2.82/gallon for premium gasoline
Methods
This dataset uses many categories in its transmission and class features. To make visualizations more effective, the categories were combined as much as possible to reduce the total number of unique categories. For example, dozens of transmission categories were condensed to just Automatic or Manual, and dozens of vehicle class categories were condensed to just “Car”, “Truck”, “Van”, or “Other”. This allows visualizations to describe how vehicle class and transmission interact with other attributes without overloading the viewer with information.
Visualizations
Originally, the visualizations planned for this analysis were the following:
- Average Highway / City Fuel Efficiency By Year (Boxplot)
- Prevalence of Manual/ Automatic Transmissions by Year Per Class (Barplot)
- Average Annual Fuel Cost by Year Per Class (Scatterplot)
While preparing the data and code for creating these visualizations, a few problems were identified as mentioned previously. First, presenting this data by year is a bit cumbersome because the dataset consists of a range of 33 years. This is fine for examining continuous variables such as Fuel Cost or Fuel Efficiency, but 33 years is too many to split when examining categorical variables. The solution for this was to create a new categorical attribute year_group which categorizes years into decades (80s, 90s, 2000s, 2010s). A similar issue was identified with the transmission and class categories, because the features as provided by the dataset were organized into over 30 and 40 categories respectively. This is way too many categories to fit cleanly into visualizations as described above, so those features were condensed into smaller categories. Additionally, some vehicle class categories were removed entirely due to the author’s inability to reassign to an appropriate category.
Visualization 1: Fuel Efficiency By Class Per Year
Image 03 shows how the distribution of fuel efficiency has changed over the last few decades. It appears that the median and IQR of fuel efficiency values have stayed relatively consistent over the years, but increasingly higher amounts of outlier values are found in more recent decades. Intuitively, this could be due to the increase of electric and other environmentally friendly vehicles being introduced into the market over the last 20 years.
Visualization 2: Annual Fuel Cost By Class By Year
Image 04 plots the distribution of estimated annual fuel costs over time split by vehicle category. For this visualization, the original year feature was used instead of the new year_group so that we can see year by year trends. For all three vehicle categories shown, fuel cost appears to be relatively consistent over time. Notably, the “cars” category has the widest range of values due to it being the largest and most varied category. This visualization could be improved by further examining the vehicle categories, and possibly dividing them into a more evenly distributed set of categories.
Visualization 3: Prevalence of Transmission Type by Class By Year
The final visualization created plots the total prevalence of vehicle transmission types, separated by decade and colored by class. In all four decades, cars with automatic transmission appear to be the most prevalent group. Additionally, the gap between manual and automatic prevalence appears to be getting wider with each decade.
Summary
The primary goal of this Mini Project is to practice some of the principles of data visualization and design that we have learned over the past several weeks. As the above visualizations were being prepared, attention was given to important aspects such as use of chart type, color, and avoiding information overload. From the very beginning of the analysis, steps were taken to prevent the visualizations from being too busy by condensing categories and splitting year into decades. Additionally, the color palette was specifically chosen to distinguish the vehicle categories, but color was not used in places that wouldn’t provide information to the reader. Finally, all titles and axis labels were specifically written (or excluded) in an effort to maximize information to the reader without sacrificing minimalism.
LS0tDQp0aXRsZTogIkRhdGEgVmlzdWFsaXphdGlvbiBNaW5pIFByb2plY3QgMSINCnN1YnRpdGxlOiAiRnVlbCBFY29ub215IERhdGEgRXhwbG9yYXRpb24iDQphdXRob3I6ICJFdGhhbiBCYXJkIg0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCg0KIyMjIEludHJvZHVjdGlvbiBhbmQgRGF0YSBTdW1tYXJ5DQoNClRoZSBkYXRhc2V0IHVzZWQgaW4gdGhpcyBleHBsb3JhdG9yeSBhbmFseXNpcyBjb25zaXN0cyBvZiAzODExMyBvYnNlcnZhdGlvbnMgb2YgODEgZmVhdHVyZXMgZGVzY3JpYmluZyBmdWVsIGVjb25vbXkgZGF0YSBmcm9tIDE5ODQtMjAxNyBieSB0aGUgRW52aXJvbm1lbnRhbCBQcm90ZWN0aW9uIEFnZW5jeSAoRVBBKS4gT2YgdGhlIDgxIGZlYXR1cmVzIGluY2x1ZGVkIGluIHRoZSBkYXRhc2V0IGFzIHByb3ZpZGVkLCA4IG9mIHRoZW0gY29udGFpbiBubyBkYXRhIGF0IGFsbCwgYW5kIHNldmVyYWwgbW9yZSBvZiB0aGVtIGNvbnRhaW4gaGlnaCBwcm9wb3J0aW9ucyBvZiBtaXNzaW5nIHZhbHVlcy4gVG8gYXZvaWQgZGVhbGluZyB3aXRoIHRoZXNlIGluY29tcGxldGUgZmVhdHVyZXMsIHRoaXMgYW5hbHlzaXMgd2lsbCBmb2N1cyBvbiB0aGUgZ2VuZXJhbCBhdHRyaWJ1dGVzIHRoYXQgYXJlIGF2YWlsYWJsZSBmb3IgYWxsIHJlY29yZHMuIFRoZSBwcmltYXJ5IGZlYXR1cmVzIGJlaW5nIGV2YWx1YXRlZCBpbiB0aGlzIGFuYWx5c2lzIGFyZSAqKnZlaGljbGUgY2xhc3MqKiwgKip0cmFuc21pc3Npb24gdHlwZSoqLCAqKnllYXIqKiwgKipmdWVsIGVmZmljaWVuY3kgKG1wZykqKiwgYW5kICoqZnVlbCBjb3N0KiouDQoNCiFbKkltYWdlIDAxKl0oaW1hZ2VzL2Z1ZWxfZWZmaWNpZW5jeS5zdmcpDQoNClRoZSBpbWFnZSBhYm92ZSByZXByZXNlbnRzIHRoZSBkaXN0cmlidXRpb24gb2YgRnVlbCBFZmZpY2llbmN5IGluIE1pbGVzIFBlciBHYWxsb24gZm9yIGFsbCB2ZWhpY2xlcyBpbmNsdWRlZCBpbiB0aGUgZGF0YXNldC4gVGhlIGhpZ2hlc3QgcHJvcG9ydGlvbiBvZiB2ZWhpY2xlcyBhcHBlYXIgdG8gaGF2ZSBhIE1QRyBsZXNzIHRoYW4gMjUuIA0KDQoNCiFbKkltYWdlIDAyKl0oaW1hZ2VzL2Z1ZWxfY29zdC5zdmcpDQoNClRoZSBkaXN0cmlidXRpb24gb2YgRXN0aW1hdGVkIEFubnVhbCBGdWVsIGNvc3QgaXMgcHJlc2VudGVkIGFib3ZlLiBBY2NvcmRpbmcgdG8gdGhlICBba2FnZ2xlLmNvbV0oaHR0cHM6Ly93d3cua2FnZ2xlLmNvbS9lcGEvZnVlbC1lY29ub215KSBkZXNjcmlwdGlvbiBvZiB0aGlzIGRhdGFzZXQsIEFubnVhbCBGdWVsIENvc3QgaXMgZXN0aW1hdGVkIGJhc2VkIG9uIGEgZmV3IGFzc3VtcHRpb25zLiBJdCBpcyBhc3N1bWVkIHRoYXQgZm9yIDE1LDAwMCBtaWxlcyBvZiBkcml2aW5nIHBlciB5ZWFyLCA1NSUgYW5kIDQ1JSBvZiB0aGF0IGRyaXZpbmcgaXMgZG9uZSBvbiBjaXR5IHJvYWRzIGFuZCBoaWdod2F5IHJvYWRzIHJlc3BlY3RpdmVseS4gQWRkaXRpb25hbGx5LCBpdCBpcyBhc3N1bWVkIHRoYXQgZnVlbCBjb3N0IG1haW50YWluIHRoZSBmb2xsb3dpbmcgcHJpY2Ugc3RydWN0dXJlOg0KDQotICQyLjMzL2dhbGxvbiBmb3IgcmVndWxhciBnYXNvbGluZQ0KLSAkMi41OC9nYWxsb24gZm9yIG1pZC1ncmFkZSBnYXNvbGluZQ0KLSAkMi44Mi9nYWxsb24gZm9yIHByZW1pdW0gZ2Fzb2xpbmUNCg0KDQoNCiMjIyBNZXRob2RzDQoNClRoaXMgZGF0YXNldCB1c2VzIG1hbnkgY2F0ZWdvcmllcyBpbiBpdHMgdHJhbnNtaXNzaW9uIGFuZCBjbGFzcyBmZWF0dXJlcy4gVG8gbWFrZSB2aXN1YWxpemF0aW9ucyBtb3JlIGVmZmVjdGl2ZSwgdGhlIGNhdGVnb3JpZXMgd2VyZSBjb21iaW5lZCBhcyBtdWNoIGFzIHBvc3NpYmxlIHRvIHJlZHVjZSB0aGUgdG90YWwgbnVtYmVyIG9mIHVuaXF1ZSBjYXRlZ29yaWVzLiBGb3IgZXhhbXBsZSwgZG96ZW5zIG9mIGB0cmFuc21pc3Npb25gIGNhdGVnb3JpZXMgd2VyZSBjb25kZW5zZWQgdG8ganVzdCAqKkF1dG9tYXRpYyoqIG9yICoqTWFudWFsKiosIGFuZCBkb3plbnMgb2YgdmVoaWNsZSBgY2xhc3NgIGNhdGVnb3JpZXMgd2VyZSBjb25kZW5zZWQgdG8ganVzdCAqKiJDYXIiKiosICoqIlRydWNrIioqLCAqKiJWYW4iKiosIG9yICoqIk90aGVyIioqLiBUaGlzIGFsbG93cyB2aXN1YWxpemF0aW9ucyB0byBkZXNjcmliZSBob3cgdmVoaWNsZSBjbGFzcyBhbmQgdHJhbnNtaXNzaW9uIGludGVyYWN0IHdpdGggb3RoZXIgYXR0cmlidXRlcyB3aXRob3V0IG92ZXJsb2FkaW5nIHRoZSB2aWV3ZXIgd2l0aCBpbmZvcm1hdGlvbi4gDQoNCg0KIyMjIFZpc3VhbGl6YXRpb25zDQoNCk9yaWdpbmFsbHksIHRoZSB2aXN1YWxpemF0aW9ucyBwbGFubmVkIGZvciB0aGlzIGFuYWx5c2lzIHdlcmUgdGhlIGZvbGxvd2luZzogDQoNCi0gQXZlcmFnZSBIaWdod2F5IC8gQ2l0eSBGdWVsIEVmZmljaWVuY3kgQnkgWWVhciAoQm94cGxvdCkNCi0gUHJldmFsZW5jZSBvZiBNYW51YWwvIEF1dG9tYXRpYyBUcmFuc21pc3Npb25zIGJ5IFllYXIgUGVyIENsYXNzIChCYXJwbG90KQ0KLSBBdmVyYWdlIEFubnVhbCBGdWVsIENvc3QgYnkgWWVhciBQZXIgQ2xhc3MgKFNjYXR0ZXJwbG90KQ0KDQpXaGlsZSBwcmVwYXJpbmcgdGhlIGRhdGEgYW5kIGNvZGUgZm9yIGNyZWF0aW5nIHRoZXNlIHZpc3VhbGl6YXRpb25zLCBhIGZldyBwcm9ibGVtcyB3ZXJlIGlkZW50aWZpZWQgYXMgbWVudGlvbmVkIHByZXZpb3VzbHkuIEZpcnN0LCBwcmVzZW50aW5nIHRoaXMgZGF0YSBieSBgeWVhcmAgaXMgYSBiaXQgY3VtYmVyc29tZSBiZWNhdXNlIHRoZSBkYXRhc2V0IGNvbnNpc3RzIG9mIGEgcmFuZ2Ugb2YgMzMgeWVhcnMuIFRoaXMgaXMgZmluZSBmb3IgZXhhbWluaW5nIGNvbnRpbnVvdXMgdmFyaWFibGVzIHN1Y2ggYXMgRnVlbCBDb3N0IG9yIEZ1ZWwgRWZmaWNpZW5jeSwgYnV0IDMzIHllYXJzIGlzIHRvbyBtYW55IHRvIHNwbGl0IHdoZW4gZXhhbWluaW5nIGNhdGVnb3JpY2FsIHZhcmlhYmxlcy4gVGhlIHNvbHV0aW9uIGZvciB0aGlzIHdhcyB0byBjcmVhdGUgYSBuZXcgY2F0ZWdvcmljYWwgYXR0cmlidXRlIGB5ZWFyX2dyb3VwYCB3aGljaCBjYXRlZ29yaXplcyB5ZWFycyBpbnRvIGRlY2FkZXMgKDgwcywgOTBzLCAyMDAwcywgMjAxMHMpLiBBIHNpbWlsYXIgaXNzdWUgd2FzIGlkZW50aWZpZWQgd2l0aCB0aGUgYHRyYW5zbWlzc2lvbmAgYW5kIGBjbGFzc2AgY2F0ZWdvcmllcywgYmVjYXVzZSB0aGUgZmVhdHVyZXMgYXMgcHJvdmlkZWQgYnkgdGhlIGRhdGFzZXQgd2VyZSBvcmdhbml6ZWQgaW50byBvdmVyIDMwIGFuZCA0MCBjYXRlZ29yaWVzIHJlc3BlY3RpdmVseS4gVGhpcyBpcyB3YXkgdG9vIG1hbnkgY2F0ZWdvcmllcyB0byBmaXQgY2xlYW5seSBpbnRvIHZpc3VhbGl6YXRpb25zIGFzIGRlc2NyaWJlZCBhYm92ZSwgc28gdGhvc2UgZmVhdHVyZXMgd2VyZSBjb25kZW5zZWQgaW50byBzbWFsbGVyIGNhdGVnb3JpZXMuIEFkZGl0aW9uYWxseSwgc29tZSB2ZWhpY2xlIGNsYXNzIGNhdGVnb3JpZXMgd2VyZSByZW1vdmVkIGVudGlyZWx5IGR1ZSB0byB0aGUgYXV0aG9yJ3MgaW5hYmlsaXR5IHRvIHJlYXNzaWduIHRvIGFuIGFwcHJvcHJpYXRlIGNhdGVnb3J5LiANCg0KIyMjIyBWaXN1YWxpemF0aW9uIDE6IEZ1ZWwgRWZmaWNpZW5jeSBCeSBDbGFzcyBQZXIgWWVhcg0KDQohWypJbWFnZSAwMypdKGltYWdlcy9tcGdfeWVhcl9jbGFzcy5zdmcpDQoNCkltYWdlIDAzIHNob3dzIGhvdyB0aGUgZGlzdHJpYnV0aW9uIG9mIGZ1ZWwgZWZmaWNpZW5jeSBoYXMgY2hhbmdlZCBvdmVyIHRoZSBsYXN0IGZldyBkZWNhZGVzLiBJdCBhcHBlYXJzIHRoYXQgdGhlIG1lZGlhbiBhbmQgSVFSIG9mIGZ1ZWwgZWZmaWNpZW5jeSB2YWx1ZXMgaGF2ZSBzdGF5ZWQgcmVsYXRpdmVseSBjb25zaXN0ZW50IG92ZXIgdGhlIHllYXJzLCBidXQgaW5jcmVhc2luZ2x5IGhpZ2hlciBhbW91bnRzIG9mIG91dGxpZXIgdmFsdWVzIGFyZSBmb3VuZCBpbiBtb3JlIHJlY2VudCBkZWNhZGVzLiBJbnR1aXRpdmVseSwgdGhpcyBjb3VsZCBiZSBkdWUgdG8gdGhlIGluY3JlYXNlIG9mIGVsZWN0cmljIGFuZCBvdGhlciBlbnZpcm9ubWVudGFsbHkgZnJpZW5kbHkgdmVoaWNsZXMgYmVpbmcgaW50cm9kdWNlZCBpbnRvIHRoZSBtYXJrZXQgb3ZlciB0aGUgbGFzdCAyMCB5ZWFycy4gDQoNCiMjIyMgVmlzdWFsaXphdGlvbiAyOiBBbm51YWwgRnVlbCBDb3N0IEJ5IENsYXNzIEJ5IFllYXINCg0KIVsqSW1hZ2UgMDQqXShpbWFnZXMvZnVlbF9jb3N0X3llYXIuc3ZnKQ0KDQpJbWFnZSAwNCBwbG90cyB0aGUgZGlzdHJpYnV0aW9uIG9mIGVzdGltYXRlZCBhbm51YWwgZnVlbCBjb3N0cyBvdmVyIHRpbWUgc3BsaXQgYnkgdmVoaWNsZSBjYXRlZ29yeS4gRm9yIHRoaXMgdmlzdWFsaXphdGlvbiwgdGhlIG9yaWdpbmFsIGB5ZWFyYCBmZWF0dXJlIHdhcyB1c2VkIGluc3RlYWQgb2YgdGhlIG5ldyBgeWVhcl9ncm91cGAgc28gdGhhdCB3ZSBjYW4gc2VlIHllYXIgYnkgeWVhciB0cmVuZHMuIEZvciBhbGwgdGhyZWUgdmVoaWNsZSBjYXRlZ29yaWVzIHNob3duLCBmdWVsIGNvc3QgYXBwZWFycyB0byBiZSByZWxhdGl2ZWx5IGNvbnNpc3RlbnQgb3ZlciB0aW1lLiBOb3RhYmx5LCB0aGUgImNhcnMiIGNhdGVnb3J5IGhhcyB0aGUgd2lkZXN0IHJhbmdlIG9mIHZhbHVlcyBkdWUgdG8gaXQgYmVpbmcgdGhlIGxhcmdlc3QgYW5kIG1vc3QgdmFyaWVkIGNhdGVnb3J5LiBUaGlzIHZpc3VhbGl6YXRpb24gY291bGQgYmUgaW1wcm92ZWQgYnkgZnVydGhlciBleGFtaW5pbmcgdGhlIHZlaGljbGUgY2F0ZWdvcmllcywgYW5kIHBvc3NpYmx5IGRpdmlkaW5nIHRoZW0gaW50byBhIG1vcmUgZXZlbmx5IGRpc3RyaWJ1dGVkIHNldCBvZiBjYXRlZ29yaWVzLiANCg0KIyMjIyBWaXN1YWxpemF0aW9uIDM6IFByZXZhbGVuY2Ugb2YgVHJhbnNtaXNzaW9uIFR5cGUgYnkgQ2xhc3MgQnkgWWVhcg0KDQohWypJbWFnZSAwNSpdKGltYWdlcy90cmFuc21pc3Npb25fY291bnRzLnN2ZykNCg0KVGhlIGZpbmFsIHZpc3VhbGl6YXRpb24gY3JlYXRlZCBwbG90cyB0aGUgdG90YWwgcHJldmFsZW5jZSBvZiB2ZWhpY2xlIHRyYW5zbWlzc2lvbiB0eXBlcywgc2VwYXJhdGVkIGJ5IGRlY2FkZSBhbmQgY29sb3JlZCBieSBgY2xhc3NgLiBJbiBhbGwgZm91ciBkZWNhZGVzLCBjYXJzIHdpdGggYXV0b21hdGljIHRyYW5zbWlzc2lvbiBhcHBlYXIgdG8gYmUgdGhlIG1vc3QgcHJldmFsZW50IGdyb3VwLiBBZGRpdGlvbmFsbHksIHRoZSBnYXAgYmV0d2VlbiBtYW51YWwgYW5kIGF1dG9tYXRpYyBwcmV2YWxlbmNlIGFwcGVhcnMgdG8gYmUgZ2V0dGluZyB3aWRlciB3aXRoIGVhY2ggZGVjYWRlLg0KDQojIyMgU3VtbWFyeQ0KDQpUaGUgcHJpbWFyeSBnb2FsIG9mIHRoaXMgTWluaSBQcm9qZWN0IGlzIHRvIHByYWN0aWNlIHNvbWUgb2YgdGhlIHByaW5jaXBsZXMgb2YgZGF0YSB2aXN1YWxpemF0aW9uIGFuZCBkZXNpZ24gdGhhdCB3ZSBoYXZlIGxlYXJuZWQgb3ZlciB0aGUgcGFzdCBzZXZlcmFsIHdlZWtzLiBBcyB0aGUgYWJvdmUgdmlzdWFsaXphdGlvbnMgd2VyZSBiZWluZyBwcmVwYXJlZCwgYXR0ZW50aW9uIHdhcyBnaXZlbiB0byBpbXBvcnRhbnQgYXNwZWN0cyBzdWNoIGFzIHVzZSBvZiBjaGFydCB0eXBlLCBjb2xvciwgYW5kIGF2b2lkaW5nIGluZm9ybWF0aW9uIG92ZXJsb2FkLiBGcm9tIHRoZSB2ZXJ5IGJlZ2lubmluZyBvZiB0aGUgYW5hbHlzaXMsIHN0ZXBzIHdlcmUgdGFrZW4gdG8gcHJldmVudCB0aGUgdmlzdWFsaXphdGlvbnMgZnJvbSBiZWluZyB0b28gYnVzeSBieSBjb25kZW5zaW5nIGNhdGVnb3JpZXMgYW5kIHNwbGl0dGluZyBgeWVhcmAgaW50byBkZWNhZGVzLiBBZGRpdGlvbmFsbHksIHRoZSBjb2xvciBwYWxldHRlIHdhcyBzcGVjaWZpY2FsbHkgY2hvc2VuIHRvIGRpc3Rpbmd1aXNoIHRoZSB2ZWhpY2xlIGNhdGVnb3JpZXMsIGJ1dCBjb2xvciB3YXMgbm90IHVzZWQgaW4gcGxhY2VzIHRoYXQgd291bGRuJ3QgcHJvdmlkZSBpbmZvcm1hdGlvbiB0byB0aGUgcmVhZGVyLiBGaW5hbGx5LCBhbGwgdGl0bGVzIGFuZCBheGlzIGxhYmVscyB3ZXJlIHNwZWNpZmljYWxseSB3cml0dGVuIChvciBleGNsdWRlZCkgaW4gYW4gZWZmb3J0IHRvIG1heGltaXplIGluZm9ybWF0aW9uIHRvIHRoZSByZWFkZXIgd2l0aG91dCBzYWNyaWZpY2luZyBtaW5pbWFsaXNtLiANCg0KDQo=